On Combining Side Information and Unlabeled Data for Heterogeneous Multi-Task Metric Learning

نویسندگان

  • Yong Luo
  • Yonggang Wen
  • Dacheng Tao
چکیده

Distance metric learning (DML) is critical for a wide variety of machine learning algorithms and pattern recognition applications. Transfer metric learning (TML) leverages the side information (e.g., similar/dissimilar constraints over pairs of samples) from related domains to help the target metric learning (with limited information). Current TML tools usually assume that different domains exploit the same feature representation, and thus are not applicable to tasks where data are drawn from heterogeneous domains. Heterogeneous transfer learning approaches handle heterogeneous domains by usually learning feature transformations across different domains. The learned transformation can be used to derive a metric, but these approaches are mostly limited by their capability of only handling two domains. This motivates the proposed heterogeneous multi-task metric learning (HMTML) framework for handling multiple domains by combining side information and unlabeled data. Specifically, HMTML learns the metrics for all different domains simultaneously by maximizing their high-order correlation (parameterized by feature covariance of unlabeled data) in a common subspace, which is induced by the transformations derived from the metrics. Extensive experiments on both multi-language text categorization and multi-view social image annotation demonstrate the effectiveness of the proposed method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

Manifold Regularized Transfer Distance Metric Learning

The performance of many computer vision and machine learning algorithms are heavily depend on the distance metric between samples. It is necessary to e xploit abundant of side information like pairwise constraints to learn a robust and reliable distance metric. While in real world application, large quantities of labeled data is unavailable due to the high labeling cost. Transfer distance metri...

متن کامل

An Effective Approach for Robust Metric Learning in the Presence of Label Noise

Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...

متن کامل

Multiclass Semi-supervised Boosting Using Different Distance Metrics

The goal of this thesis project is to build an effective multiclass classifier which can be trained with a small amount of labeled data and a large pool of unlabeled data by applying semi-supervised learning in a boosting framework. Boosting refers to a general method of producing a very accurate classifier by combining rough and moderately inaccurate classifiers. It has attracted a significant...

متن کامل

Exploring the Power of Heterogeneous

The big data challenge is one unique opportunity for both data mining and database research and engineering. A vast ocean of data are collected from trillions of connected devices in real time on a daily basis, and useful knowledge is usually buried in data of multiple genres, from different sources, in different formats, and with different types of representation. Many interesting patterns can...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016